Recognition of Structured Collocations in An Inflective Language
نویسندگان
چکیده
We present a method of the structural collocations extraction for an inflective language (Polish) based on the process divided into two phases: extraction and filtering of the pairs of wordforms reduced to baseforms and structural annotation of the extracted collocations with lexico-syntactic patterns. The parameters of the patterns are specified manually but their instances are generated and tested on the corpus automatically. The extracted collocations were evaluated by applying them as rules in morpho-syntactic disambiguation of Polish and by comparing them with a lists of two-word expressions extracted from two Polish dictionaries.
منابع مشابه
Combination of a hidden tag model and a traditional n-gram model: a case study in czech speech recognition
A speech recognition system targeting high inflective languages is described that combines the traditional trigram language model and an HMM tagger, obtaining results superior to the trigram language model itself. An experiment in speech recognition of Czech has been performed with promising results. 1. Speech Recognition of Inflective Languages Inflective languages pose a hard problem in speec...
متن کاملThe Effects of Collaborative Versus Non-collaborative Massed and Distributed Presentation on the Comprehension and Production of Lexical Collocations
To investigate the effect of massed and distributed collaborative and non-collaborative presentation on L2 learners’ comprehension and production of lexical collocations, 105 participants at Takestan Islamic Azad University in 4 groups were assigned to four different treatment conditions (collaborative-massed; collaborative-distributed; noncollaborative-massed; and noncollaborative-distributed ...
متن کاملThe Effects of Collaborative and Individual Output Tasks on Learning English Collocations
One of the most problematic areas in foreign language learning is collocation. It is often seen as arbitrary and an overwhelming obstacle to the achievement of nativelike fluency. Current second language (L2) instruction research has encouraged the use of collaborative output tasks in L2 classrooms. This study examined the effects of two types of output tasks (editing and cloze) on the learni...
متن کاملCollocational Processing in Two Languages: A psycholinguistic comparison of monolinguals and bilinguals
With the renewed interest in the field of second language learning for the knowledge of collocating words, research findings in favour of holistic processing of formulaic language could support the idea that these language units facilitate efficient language processing. This study investigated the difference between processing of a first language (L1) and a second language (L2) of congruent col...
متن کاملSpeech Recognition of Czech-Inclusion of Rare Words Helps
Large vocabulary continuous speech recognition of inflective languages, such as Czech, Russian or Serbo-Croatian, is heavily deteriorated by excessive out of vocabulary rate. In this paper, we tackle the problem of vocabulary selection, language modeling and pruning for inflective languages. We show that by explicit reduction of out of vocabulary rate we can achieve significant improvements in ...
متن کامل